elasticsearch安装IK分词器

随笔分类

分词器

ES中默认使用的分词器对中文分词并不友好，不适用于中文网站，为了达到更佳的搜索效果，需要对 ES的默认分词器进行替换，以来得到中文的友好分词

使用 IK分词器

安装

ES是使用 docker安装的，网上许多教程不可信，自己稍微去折腾了下，在此记录下配置结果待日后使用

IK分词器需要和安装的 ES版本严格一致，和 Kibana一样

https://github.com/medcl/elasticsearch-analysis-ik/releases/tag/v6.8.0，找到指定的 IK分词器版本即可，本人的是 6.8.0

在进入container后，cd plugins/，实际上进入的目录是 /usr/share/elasticsearch/plugins

[root@iz8vbfwigxwd3shlj9smd0z ~]# docker exec -it d886a17edc8f /bin/bash
[root@d886a17edc8f elasticsearch]# cd plugins/
[root@d886a17edc8f plugins]# pwd
/usr/share/elasticsearch/plugins

[root@d886a17edc8f plugins]# ls
[root@d886a17edc8f plugins]# mkdir ik
[root@d886a17edc8f plugins]# ls
ik
[root@d886a17edc8f plugins]# cd ik

在线安装 IK，注意版本一致问题

[root@d886a17edc8f plugins]# wget https://github.wuyanzheshui.workers.dev/medcl/elasticsearch-analysis-ik/releases/download/v6.8.0/elasticsearch-analysis-ik-6.8.0.zip

解压到当前目录(即 IK)，然后退出容器，重启容器服务即可

[root@d886a17edc8f ik]# unzip elasticsearch-analysis-ik-6.8.0.zip
Archive:  elasticsearch-analysis-ik-6.8.0.zip
   creating: config/
  inflating: config/quantifier.dic   
  inflating: config/stopword.dic     
  inflating: config/preposition.dic  
  inflating: config/main.dic         
  inflating: config/extra_main.dic   
  inflating: config/IKAnalyzer.cfg.xml  
  inflating: config/extra_single_word_full.dic  
  inflating: config/extra_stopword.dic  
  inflating: config/extra_single_word_low_freq.dic  
  inflating: config/suffix.dic       
  inflating: config/surname.dic      
  inflating: config/extra_single_word.dic  
  inflating: elasticsearch-analysis-ik-6.8.0.jar  
  inflating: httpclient-4.5.2.jar    
  inflating: httpcore-4.4.4.jar      
  inflating: commons-logging-1.2.jar  
  inflating: commons-codec-1.9.jar   
  inflating: plugin-descriptor.properties  
  inflating: plugin-security.policy  
[root@d886a17edc8f ik]# ls
commons-codec-1.9.jar    elasticsearch-analysis-ik-6.8.0.jar  httpcore-4.4.4.jar
commons-logging-1.2.jar  elasticsearch-analysis-ik-6.8.0.zip  plugin-descriptor.properties
config                   httpclient-4.5.2.jar                 plugin-security.policy
[root@d886a17edc8f ik]# exit
exit
[root@iz8vbfwigxwd3shlj9smd0z ~]# docker restart d886a17edc8f
d886a17edc8f

进行测试

注意：分词器名为：ik_max_word

GET /ems/_analyze
{
  "analyzer": "ik_max_word",
  "text":"这个年轻人不简单"
}

# 分词后效果
{
  "tokens" : [
    {
      "token" : "这个",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "年轻人",
      "start_offset" : 2,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "年轻",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "人",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "CN_CHAR",
      "position" : 3
    },
    {
      "token" : "不简单",
      "start_offset" : 5,
      "end_offset" : 8,
      "type" : "CN_WORD",
      "position" : 4
    },
    {
      "token" : "不",
      "start_offset" : 5,
      "end_offset" : 6,
      "type" : "CN_CHAR",
      "position" : 5
    },
    {
      "token" : "简单",
      "start_offset" : 6,
      "end_offset" : 8,
      "type" : "CN_WORD",
      "position" : 6
    }
  ]
}

至此，IK分词器安装成功

本作品采用知识共享署名-非商业性使用 4.0 国际许可协议进行许可

一	二	三	四	五	六	日
« 3月
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30